KEGG as a reference resource for gene and protein annotation
نویسندگان
چکیده
KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an integrated database resource for biological interpretation of genome sequences and other high-throughput data. Molecular functions of genes and proteins are associated with ortholog groups and stored in the KEGG Orthology (KO) database. The KEGG pathway maps, BRITE hierarchies and KEGG modules are developed as networks of KO nodes, representing high-level functions of the cell and the organism. Currently, more than 4000 complete genomes are annotated with KOs in the KEGG GENES database, which can be used as a reference data set for KO assignment and subsequent reconstruction of KEGG pathways and other molecular networks. As an annotation resource, the following improvements have been made. First, each KO record is re-examined and associated with protein sequence data used in experiments of functional characterization. Second, the GENES database now includes viruses, plasmids, and the addendum category for functionally characterized proteins that are not represented in complete genomes. Third, new automatic annotation servers, BlastKOALA and GhostKOALA, are made available utilizing the non-redundant pangenome data set generated from the GENES database. As a resource for translational bioinformatics, various data sets are created for antimicrobial resistance and drug interaction networks.
منابع مشابه
Babel's tower revisited: a universal resource for cross-referencing across annotation databases
MOTIVATION Annotation databases are widely used as public repositories of biological knowledge. However, most of these resources have been developed by independent groups which used different designs and different identifiers for the same biological entities. As we show in this article, incoherent name spaces between various databases represent a serious impediment to using the existing annotat...
متن کاملMPSS: an integrated database system for surveying a set of proteins
SUMMARY We design and implement an integrated database system called 'multi-protein survey system' (MPSS), which provides a platform to retrieve information about many proteins at a time. This system integrates several important and widely used databases including SwissProt, TrEMBL, PDB and InterPro, plus useful references such as GO and KEGG to other databases. Users may submit a group of prot...
متن کاملCOFECO: composite function annotation enriched by protein complex data
COFECO is a web-based tool for a composite annotation of protein complexes, KEGG pathways and Gene Ontology (GO) terms within a class of genes and their orthologs under study. Widely used functional enrichment tools using GO and KEGG pathways create large list of annotations that make it difficult to derive consolidated information and often include over-generalized terms. The interrelationship...
متن کاملEfficient use of a Protein Structure Annotation Database
In this work, a multitude of data on structure and function of proteins is compiled and subsequently applied to the analysis of atomic packing. Structural analyses often require specific protein datasets, based on certain properties of the proteins, such as sequence features, protein folds, or resolution. Compiling such sets using current web resources is tedious because the necessary data are ...
متن کاملMassNet: a functional annotation service for protein mass spectrometry data
Although mass spectrometry has been frequently used to identify proteins, there are no web servers that provide comprehensive functional annotation of those identified proteins. It is necessary to provide such web service due to a rapid increase in the data. We, therefore, introduce MassNet, which provides (i) physico-chemical analysis information, (ii) KEGG pathway assignment (iii) Gene Ontolo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Nucleic acids research
دوره 44 D1 شماره
صفحات -
تاریخ انتشار 2016